Genome

您所在的位置:网站首页 committed cells Genome

Genome

2023-01-05 17:31| 来源: 网络整理| 查看: 265

Nature. Author manuscript; available in PMC 2010 Aug 13.Published in final edited form as:Nature. 2007 Aug 2; 448(7153): 553–560. Published online 2007 Jul 1. doi: 10.1038/nature06008PMCID: PMC2921165NIHMSID: NIHMS119563PMID: 17603471Genome-wide maps of chromatin state in pluripotent and lineage-committed cellsTarjei S. Mikkelsen,1,2 Manching Ku,1,3 David B. Jaffe,1 Biju Issac,1,3 Erez Lieberman,1,2 Georgia Giannoukos,1 Pablo Alvarez,1 William Brockman,1 Tae-Kyung Kim,4 Richard P. Koche,1,2,3 William Lee,1 Eric Mendenhall,1,3 Aisling O’Donovan,3 Aviva Presser,1 Carsten Russ,1 Xiaohui Xie,1 Alexander Meissner,5 Marius Wernig,5 Rudolf Jaenisch,5 Chad Nusbaum,1 Eric S. Lander,1,5,* and Bradley E. Bernstein1,3,6,*Tarjei S. Mikkelsen

1Broad Institute of Harvard and MIT, Cambridge, MA 02139 USA

2Division of Health Sciences and Technology, MIT, Cambridge, MA 02139, USA

Find articles by Tarjei S. MikkelsenManching Ku

1Broad Institute of Harvard and MIT, Cambridge, MA 02139 USA

3Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129 USA

Find articles by Manching KuDavid B. Jaffe

1Broad Institute of Harvard and MIT, Cambridge, MA 02139 USA

Find articles by David B. JaffeBiju Issac

1Broad Institute of Harvard and MIT, Cambridge, MA 02139 USA

3Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129 USA

Find articles by Biju IssacErez Lieberman

1Broad Institute of Harvard and MIT, Cambridge, MA 02139 USA

2Division of Health Sciences and Technology, MIT, Cambridge, MA 02139, USA

Find articles by Erez LiebermanGeorgia Giannoukos

1Broad Institute of Harvard and MIT, Cambridge, MA 02139 USA

Find articles by Georgia GiannoukosPablo Alvarez

1Broad Institute of Harvard and MIT, Cambridge, MA 02139 USA

Find articles by Pablo AlvarezWilliam Brockman

1Broad Institute of Harvard and MIT, Cambridge, MA 02139 USA

Find articles by William BrockmanTae-Kyung Kim

4Department of Neurology, Children’s Hospital, Boston, MA 02115 USA

Find articles by Tae-Kyung KimRichard P. Koche

1Broad Institute of Harvard and MIT, Cambridge, MA 02139 USA

2Division of Health Sciences and Technology, MIT, Cambridge, MA 02139, USA

3Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129 USA

Find articles by Richard P. KocheWilliam Lee

1Broad Institute of Harvard and MIT, Cambridge, MA 02139 USA

Find articles by William LeeEric Mendenhall

1Broad Institute of Harvard and MIT, Cambridge, MA 02139 USA

3Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129 USA

Find articles by Eric MendenhallAisling O’Donovan

3Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129 USA

Find articles by Aisling O’DonovanAviva Presser

1Broad Institute of Harvard and MIT, Cambridge, MA 02139 USA

Find articles by Aviva PresserCarsten Russ

1Broad Institute of Harvard and MIT, Cambridge, MA 02139 USA

Find articles by Carsten RussXiaohui Xie

1Broad Institute of Harvard and MIT, Cambridge, MA 02139 USA

Find articles by Xiaohui XieAlexander Meissner

5Whitehead Institute for Biomedical Research, MIT, Cambridge, MA 02139 USA

Find articles by Alexander MeissnerMarius Wernig

5Whitehead Institute for Biomedical Research, MIT, Cambridge, MA 02139 USA

Find articles by Marius WernigRudolf Jaenisch

5Whitehead Institute for Biomedical Research, MIT, Cambridge, MA 02139 USA

Find articles by Rudolf JaenischChad Nusbaum

1Broad Institute of Harvard and MIT, Cambridge, MA 02139 USA

Find articles by Chad NusbaumEric S. Lander

1Broad Institute of Harvard and MIT, Cambridge, MA 02139 USA

5Whitehead Institute for Biomedical Research, MIT, Cambridge, MA 02139 USA

Find articles by Eric S. LanderBradley E. Bernstein

1Broad Institute of Harvard and MIT, Cambridge, MA 02139 USA

3Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129 USA

6Department of Pathology, Harvard Medical School, Boston, MA 02115 USA

Find articles by Bradley E. BernsteinAuthor information Copyright and License information Disclaimer1Broad Institute of Harvard and MIT, Cambridge, MA 02139 USA2Division of Health Sciences and Technology, MIT, Cambridge, MA 02139, USA3Molecular Pathology Unit and Center for Cancer Research, Massachusetts General Hospital, Charlestown, MA 02129 USA4Department of Neurology, Children’s Hospital, Boston, MA 02115 USA5Whitehead Institute for Biomedical Research, MIT, Cambridge, MA 02139 USA6Department of Pathology, Harvard Medical School, Boston, MA 02115 USACorrespondence and requests for materials should be addressed to E.S.L. (ude.tim.daorb@rednal) or B.E.B. (gro.srentrap@nietsnrebb)*These authors co-supervised the work.Copyright notice The publisher's final edited version of this article is available at NatureSee commentary "The epigenomic era opens" in Nature, volume 448 on page 548.Associated DataSupplementary Materials2.NIHMS119563-supplement-2.pdf (5.8M)GUID: B67A283F-1B75-4283-B2AF-99A7AEB5CA51Abstract

We report the application of single molecule-based sequencing technology for high-throughput profiling of histone modifications in mammalian cells. By obtaining over 4 billion bases of sequence from chromatin immunoprecipitated DNA, we generated genome-wide chromatin state maps of mouse embryonic stem cells, neural progenitor cells and embryonic fibroblasts. We find that lysine 4 and lysine 27 tri-methylation effectively discriminate genes that are expressed, poised for expression, or stably repressed, and therefore reflect cell state and lineage potential. Lysine 36 tri-methylation marks primary coding and non-coding transcripts, facilitating gene annotation. Lysine 9 and lysine 20 tri-methylation are detected at satellite, telomeric and active long-terminal repeats, and can spread into proximal unique sequences. Lysine 4 and lysine 9 tri-methylation mark imprinting control regions. Finally, we show that chromatin state can be read in an allele-specific manner by using single nucleotide polymorphisms. This study provides a framework for the application of comprehensive chromatin profiling towards characterization of diverse mammalian cell populations.

Introduction

One of the fundamental mysteries of biology is the basis of cellular state. Although their genomes are essentially identical, cell types in a multicellular organism maintain strikingly different behaviors that persist over extended periods. The most extreme case is lineage-commitment during development, where cells progress from totipotency to pluripotency to terminal differentiation; each step involves establishment of a stable state encoding specific developmental commitments that can be faithfully transmitted to daughter cells. Considerable evidence suggests that cellular state may be closely related to ‘chromatin state’ – that is, modifications to histones and other proteins that package the genome1–3. Accordingly, it would be desirable to construct ‘chromatin state maps’ for a wide variety of cell types, showing the genome-wide distribution of important chromatin modifications.

Chromatin state can be studied by chromatin immunoprecipitation (ChIP), in which an antibody is used to enrich DNA from genomic regions carrying a specific epitope. The major challenge to generating genome-wide chromatin state maps lies in characterizing these enriched regions in a scalable manner. Enrichment at individual loci is commonly assayed by PCR, but this method does not scale efficiently. A more recent approach has been ChIP-chip, in which enriched DNA is hybridized to a microarray4,5. This technique has been successfully used to study large genomic regions. However, ChIP-chip suffers from inherent technical limitations: (i) it requires large amounts (several micrograms) of DNA and thus involves extensive amplification, which introduces bias; (ii) it is subject to cross-hybridization which hinders study of repeated sequences and allelic variants; and (iii) it is currently expensive to study entire mammalian genomes. Given these issues, only a handful of whole-genome ChIP-chip studies in mammals have been reported.

In principle, chromatin could be readily mapped across the genome by sequencing ChIP DNA and identifying regions that are over-represented among these sequences. Importantly, sequence-based mapping could require relatively small quantities of DNA and provide nucleotide-level discrimination of similar sequences, thereby maximizing genome coverage. The major limitation has been that high-resolution mapping requires millions of sequences (Supplementary Note 1). This is cost-prohibitive with traditional technology, even with concatenation of multiple sequence tags6. However, recent advances in single molecule-based sequencing (SMS) technology promise to dramatically increase throughput and decrease costs7. In the approach developed by Illumina/Solexa, DNA molecules are arrayed across a surface, locally amplified, subjected to successive cycles of primer-mediated single-base extension (using fluorescently-labeled reversible terminators) and imaged after each cycle to determine the inserted base. The ‘read length’ is short (25–50 bases), but tens of millions of DNA fragments may be read simultaneously.

Here, we report the development of a method for mapping ChIP enrichment by sequencing (ChIP-Seq) and describe its application to create chromatin-state maps for pluripotent and lineage-committed mouse cells. The resulting data (1) define three broad categories of promoters based on their chromatin state in ES cells, including a larger than anticipated set of ‘bivalent’ promoters; (2) reveal that lineage commitment is accompanied by characteristic chromatin changes at bivalent promoters that parallel changes in gene expression and transcriptional competence; (3) demonstrate the potential for using ChIP for genome-wide annotation of novel promoters and primary transcripts, active transposable elements, imprinting control regions and allele-specific transcription. This study provides a technological framework for comprehensive characterization of chromatin-state across diverse mammalian cell populations.

Genome-wide chromatin state maps

We created genome-wide chromatin state maps for three mouse cell types: ES cells, neural progenitor cells (NPCs)8 and embryonic fibroblasts (MEFs). For each cell type, we prepared and sequenced ChIP DNA samples for some or all of the following features: pan-H3, H3K4me3, H3K9me3, H3K27me3, H3K36me3, H4K20me3 and RNA polymerase II (Supplementary Table 1).

In each case, we sequenced nanogram quantities of DNA fragments (~300 bp) on a Solexa 1GGA sequencer. We obtained an average of 10 million successful reads, consisting of the terminal 27–36 bases of each fragment. The reads were mapped to the genome and used to determine the number of ChIP fragments overlapping any given position (Fig. 1). Enriched intervals were defined as regions where this number exceeded a threshold defined by randomization (see Methods). The full data set consists of 18 chromatin-state maps, containing ~140 million uniquely aligned reads, representing over 4 billion bases of sequence.

Open in a separate windowFigure 1Comparison of ChIP-Seq and ChIP-chip data

Direct comparison of H3K4me3 (green) and H3K27me3 (red) ChIP data across a 300 kb region in mouse ESCs from independent experiments assayed by SMS (absolute fragment counts) or tiling arrays (log p-values for enrichment relative to whole-cell extracts15).

We validated the chromatin state maps by computational analysis and by comparison to previous methods. ChIP-Seq maps of specific histone modifications show marked enrichment at specific locations in the genome, while the pan-H3 and unenriched samples show relatively uniform distributions (Supplementary Fig. 1–2). The maps show close agreement with our previously reported ChIP-chip data from ~2.5% of the mouse genome9 (Fig. 1). Also, ChIP-PCR assays of 50 sites chosen to represent a range of ChIP-Seq fragment counts showed 98% concordance and a strong, quantitative correlation (Supplementary Fig. 3; Supplementary Table 2).

Promoter state in ES and lineage-committed cells

We began our analysis by studying H3K4me3 and H3K27me3 patterns at known promoters. H3K4me3 is catalyzed by trithorax-group (trxG) proteins and associated with activation, while H3K27me3 is catalyzed by Polycomb-group (PcG) proteins and associated with silencing10,11. Recently, we and others observed that some promoters in ES cells carry both H3K4me3 and H3K27me39,12. We termed this novel combination a ‘bivalent’ chromatin mark and proposed that it serves to poise key developmental genes for lineage-specific activation or repression.

We studied 17,762 promoters inferred from full-length cDNAs (Supplementary Table 3). Mammalian RNA Polymerase II promoters are known to occur in at least two major forms13,14 (Supplementary Fig. 4). CpG-rich promoters are associated with both ubiquitously expressed ‘housekeeping’ genes, and genes with more complex expression patterns, particularly those expressed during embryonic development. CpG-poor promoters are generally associated with highly tissue-specific genes. Accordingly, we divided our analysis to focus on high CpG promoters (HCP; n=11,410) and low CpG promoters (LCP; n=3,014) separately. To ensure a clean separation, we defined a set of intermediate CpG content promoters (ICP; n=3,338); this class shows properties consistent with being a mixture of the two major classes.

High CpG promoters in ES cells

Virtually all HCPs (99%) are associated with intervals of significant H3K4me3 enrichment in ES cells (Fig. 2a). The modified histones are typically confined to a punctate interval of 1–2 kb (Supplementary Fig. 5). As observed previously15,16, there is a strong correlation between the intensity of H3K4me3 and the expression level of the associated genes (Spearman’s ρ=0.67). However, not all promoters associated with H3K4me3 are active.

Open in a separate windowFigure 2Histone tri-methylation state predicts expression of HCP and LCP promoters

(a) Mammalian promoters can be readily classified into sets with high (HCPs), intermediate (ICPs) or low (LCPs) CpG-content. In ES cells (ESCs), virtually all HCPs are marked by H3K4me3, either alone (green) or in combination with H3K27me3 (yellow). In contrast, most LCPs have neither mark (grey). Few promoters are only enriched for H3K27me3 (red). (b) Tri-methylation states of HCPs and LCPs in NPCs (indicated by colors), conditional on their ESC state (indicated below each bar). HCPs marked by H3K4me3 only in ESCs tend to retain this mark. HCPs marked by H3K4me3 and H3K27me3 tend to lose one or both marks, although some remain bivalent. Small, partially overlapping subsets of LCPs are marked by H3K4me3. (c) Tri-methylation states of HCPs and LCPs in MEFs. (d) Changes in expression levels of HCP genes with H3K4me3 alone (left) or also with H3K27me3 (right) upon differentiation to NPCs. Resolution of bivalent promoters to H3K4me3 is associated with increased expression. Boxplots show median (red bar), 25th and 75th percentile expression levels in ESCs. Whiskers show 2.5th and 97.5th percentiles. Asterisks indicate classes with less than 15 genes. (e) Changes in expression levels of LCP genes with H3K4me3 (left) or no mark (right) upon differentiation to NPCs. Gain of H3K4me3 is associated with increased expression.

The chromatin state maps reveal that ~22% of HCPs (n=2,525) are actually bivalent, exhibiting both H3K4me3 and H3K27me3 (Fig. 2a). A minority (n=564) are ‘wide’ bivalent sites in which H3K27me3 extends over a region of at least 5 kb and resemble those described previously9. The majority (n=1,961) are ‘narrow’ bivalent sites, with more punctate H3K27me3, that correspond to many additional PcG target promoters17–19. Bivalent promoters show low activity despite the presence of H3K4me3, suggesting that the repressive effect of PcG activity is generally dominant over the ubiquitous trxG activity (Supplementary Fig. 6; Supplementary Table 4).

The different types of chromatin marks at HCP promoters are closely related to the nature of the associated genes (Supplementary Table 5). Monovalent promoters (H3K4me3) generally regulate genes with ‘housekeeping’ functions including replication and basic metabolism. By contrast, bivalent promoters are associated with genes with more complex expression patterns, including key developmental transcription factors, morphogens and cell surface molecules. In addition, several bivalent promoters appear to regulate transcripts for lineage-specific microRNAs.

High CpG promoters in NPCs and MEFs

The vast majority of HCPs marked with H3K4me3 alone in ES cells retain this mark both in NPCs and MEFs (92% in each; Fig. 2b,2c,​,3a).3a). This is consistent with the tendency for this sub-class of promoters to regulate ubiquitous housekeeping genes. A small proportion (~4%) of these promoters have H3K27me3 in MEFs, and are thus bivalent or marked by H3K27me3 alone. This correlates with lower expression levels and may reflect active recruitment of PcG proteins to new genes during differentiation20. An example is the transcription factor gene Sox2, where the promoter is marked by H3K4me3 alone in ES cells and NPCs, but H3K27me3 alone in MEFs. Notably, this locus is flanked by CpG islands with bivalent markings in ES cells (see below), suggesting the locus may be poised for repression upon differentiation.

Open in a separate windowFigure 3Cell type-specific chromatin marks at promoters

(a) Multiple ‘housekeeping genes’, such as DNA Polymerase mu (Polm), are associated with HCPs marked by H3K4me3 in all cell types. (b) The neural transcription factor gene Olig1 (HCP) is bivalent in ESCs, but resolves to H3K4me3 in NPCs and H3K27me3 in MEFs. (c) The neurogenesis transcription factor gene Neurog1 (HCP) remains bivalent upon differentiation to NPCs, but resolves to H3K27me3 in MEFs. (d) The adipogenesis transcription factor gene Ppar-γ (HCP) remains bivalent in MEFs, but loses both marks in NPCs. (e) The neural progenitor marker gene Fabp7 (LCP) is marked by H3K4me3 in NPCs only. (f) The brain and lung expressed transcription factor gene Foxp2 is associated with an HCP that is bivalent in ES cells, but resolves to H3K4me3 in NPCs and remains bivalent in MEFs. (g) Foxp2 also has an LCP marked by H3K4me3 in MEFs only. (h) Multiple, distinct bivalent chromatin marks at the variable region promoters of Pcdh-γ. A promoter proximal to the constant region exons (*) is marked by H3K4me3 only.

The majority of HCPs with bivalent marks in ES cells resolve to a monovalent status in the committed cells. In NPCs, 46% resolve to H3K4me3 only and these genes show increased expression (Fig. 2b,2d,​,3b).3b). Of the remaining promoters, 14% resolve to H3K27me3 alone and 32% lose both marks, with both outcomes being associated with low levels of expression. Importantly, 8% remain bivalent and these genes also continue to be repressed (Fig. 2b,2d,​,3c).3c). A somewhat less resolved pattern is seen in MEFs, with 32% marked by H3K4me3 alone, 22% marked by H3K27me3 alone, 3% without both marks, and the remaining (43%) still bivalent (Fig. 2c). The relatively high number of bivalent promoters in MEFs may reflect a less differentiated state and/or heterogeneity in the population.

Distinct regulation of Low CpG Promoters

The LCPs show a strikingly different pattern than the HCPs. Only a small minority (6.5%, n=207) of LCPs have significant H3K4me3 in ES cells and virtually none have H3K27me3 (Fig. 2a). Most of these promoters have lost H3K4me3 in NPCs and MEFs, while a small number of other LCPs (1.5% and 2.6%, respectively) have gained the mark (Fig. 2b,2c,​,3e).3e). In all three cell types, the expression levels of the associated genes strongly correlate with presence or absence of H3K4me3 (Fig. 2e; Supplementary Fig. 6).

The genes with LCPs marked by H3K4me3 are closely related to tissue-specific functions. In NPCs, they include genes encoding several known markers of neural progenitors in vivo (such as Fabp7, Cp, Gpr56). In MEFs, they include genes encoding extracellular matrix components and growth factors (such as Col3a1, Col6a1, Postn, Aspn, Hgf, Figf), consistent with the mesenchymal origin of these cells (see below).

We conclude that HCPs and LCPs are subject to distinct modes of regulation. In ES cells, all HCPs appear to be targets of trxG activity, and may therefore drive transcription unless actively repressed by PcG proteins. In committed cell types, a subset of HCPs appear to lose the capacity to recruit trxG activity (possibly due to other epigenetic modifications, such as DNA methylation21). In contrast, CpG-poor promoters appear to be inactive by default, independent of repression by PcG proteins, and may instead be selectively activated by cell type- or tissue-specific factors.

Alternative promoter use

We note that genes with alternative promoters may have multiple, distinct chromatin states. An ‘active’ state at any one of these may be sufficient to drive expression. A common situation involves genes with one major HCP and one or more alternative LCPs. An example is the transcription factor Foxp2, which is expressed at moderate levels in both NPCs and MEFs (Fig. 3f,g). The Foxp2 HCP is marked by H3K4me3 in NPCs, but is bivalent in MEFs. However, an alternative LCP is marked by H3K4me3 exclusively in MEFs. The protocadherin-γ (Pcdh-γ) locus is a more extreme case: the N-terminal variable regions of this gene are transcribed from at least 20 different HCPs in neurons22, all of which carry bivalent chromatin marks in ES cells. Pcdh-γ expression is nevertheless detected by microarrays, possibly due to a single promoter in front of the C-terminal constant region marked by H3K4me3 alone (Fig. 3h).

Although only ~10% of the genes analyzed here have more than one known promoter, recent ‘cap-trapping’ studies suggest that alternative promoter use may be substantially more common23. The ability of ChIP-Seq to assess chromatin state at known promoters, as well as to identify novel promoters (see below), should prove valuable in analysis of transcriptional networks.

Promoter state reflects lineage commitment and potential

Given their association with epigenetic memory, we next examined whether the patterns of H3K4me3 and H3K27me3 can reflect developmental potential. Both of the committed cell types studied here have been shown to be multipotent ex vivo. NPCs can be differentiated to glial and neuronal lineages8, while primary MEFs have been differentiated into adipocytes24, chondrocytes25 and osteoblast-like cells26.

Lineage-specific resolution and retention of bivalent marks

We first examined a set of genes involved in in vivo differentiation pathways known to be, at least partially, recapitulated by MEFs, NPCs or neither. These genes all have bivalent promoters in ES cells. We found that their resolution in lineage-committed cells is closely related to their demonstrated developmental potential (Supplementary Table 6):

Genes restricted to regulation or specialized functions in unrelated lineages, such as hematopoietic (Cdx4, PU.1), epithelial (Cncf, Krt2–4), endoderm (Gata6, Pdx1) or germ line (Tenr, Ctcfl), have generally resolved to monovalent H3K27me3 or carry neither mark in both NPCs and MEFs.Genes related to adipogenesis and chondro/osteogenesis often remain bivalent in MEFs, but not in NPCs. Examples include Ppar-γ, which is a key regulator of apipogenesis, and Sp7, which promotes chondro/osteogenic pathways. Early mesenchymal markers, such as Runx1 and Sox9 resolved to H3K4me3 alone in MEFs.Genes related to gliogenesis and neurogenesis often resolve to H3K4me3 alone or remain bivalent in NPCs, while resolving to H3K27me3 alone in the MEFs. Gliogenesis and neurogenesis are thought to be mutually opposing pathways27, and we find that genes promoting gliogenesis are more likely to resolve to H3K4me3 in NPCs. Examples include Bmp2 and the miRNA mir-9-3, which promotes glial but inhibits neuronal differentiation28. Several genes known to promote neuronal differentiation, such as Neurog1 and Neurog2, remain bivalent while others, such as Bmp6, appear to resolve to H3K27me3 alone. In our hands, the NPCs differentiate to astrocytes with significantly higher efficiency than to neurons (M. Wernig, unpublished data). The observed chromatin patterns may reflect this gliogenic bias.

Correlation with expression in adult tissues

We next analyzed gene expression in adult tissues with major contributions from neuroectodermal or mesenchymal lineages. We reasoned that if H3K4me3 is generally not restored once lost, then differential loss of H3K4me3 at promoters early in these lineages (as represented by NPCs and MEFs, respectively) might be reflected in differential gene expression patterns in related adult tissues.

Strikingly, we observed a clear bias in relative expression levels between relevant adult tissues for genes that retain H3K4me3 in NPCs only versus genes that retain H3K4me3 in MEFs only. The former are strongly biased toward higher expression in various brain sections, while the latter are biased towards higher expression in bone, adipose and other mesenchyme-rich tissues (Fig. 4).

Open in a separate windowFigure 4Correlation between chromatin state changes and lineage expression

Relative expression levels across adult mouse brain (frontal and cerebral cortex, substantia nigra, cerebellu, amygdale, hypothalamus, hippocampus) and relatively mesenchyme-rich tissues (bone, white fat, brown fat, trachea, digits, lung, bladder, uterus, umbilical cord) are shown for genes with bivalent chromatin marks in ES cells that retain H3K4me3 in NPCs but lose this mark in MEFs (n=62) or vice versa (n=160). Red, white and blue indicates higher, equal and lower relative expression, respectively.

These analyses are of course limited by alternative promoter usage, the cell models used, and the heterogeneity of the adult tissues. Nonetheless, the data show clear trends that support an important role for retention and resolution of bivalent chromatin in the regulation of hierarchical lineage commitment.

Genome-wide annotation of promoters and primary transcripts

We next considered genome-wide maps of H3K36me3. This mark has been linked to transcriptional elongation and may serve to prevent aberrant initiation within gene bodies29–33. Our chromatin maps reveal a global pattern of H3K36me3 in mammals similar to that previously observed in yeast29.

In all three cell types, H3K36me3 is strongly enriched across the transcribed regions of active genes (Fig. 5a), beginning immediately after the promoter H3K4me3 signal. The level of H3K36me3 is strongly correlated with the level of gene expression (Spearman’s ρ=0.77), although the dynamic range is compressed (1–2 orders of magnitude for H3K36me3 vs 3–4 for expression levels; Supplementary Fig. 7). Genes with bivalent promoters rarely show H3K36me3, consistent with their low expression. Notably, there is essentially no overlap between intervals significantly enriched for H3K36me3 and for H3K27me3, consistent with a role for PcG complexes in the exclusion of polymerases11.

Open in a separate windowFigure 5H3K4me3 and H3K36me3 annotate genes and non-coding RNA transcripts

(a) Foxp1 has two annotated promoters (based on RefSeq and UCSC Known Genes), only one of which shows H3K4me3 in ES cells. The corresponding transcriptional unit is marked by H3K36me3. In MEFs, H3K36me3 extends an additional 500 kb upstream to an H3K4me3 site that appears to reflect an alternate promoter (this site is bivalent in ES cells). (b) H3K36me3 enrichment extends significantly downstream of Sox2. Though highly active in ES cells, Sox2 is flanked by two bivalent CpG islands that may poise it for repression. (c) H3K4me3 and H3K36me3 indicate two highly expressed non-coding RNAs, and (d) the putative primary transcript (dashed line) for a single annotated microRNA.

The vast majority of intervals significantly enriched for H3K36me3 is associated with known genes (~92% in ESCs), but there are at least ~500 additional regions across the genome (median size ~2 kb), with most being adjacent to sites of H3K4me3. Inspection revealed a number of interesting cases, falling into three categories.

The first category corresponds to H3K36me3 that extends significantly upstream from the annotated start of a known gene, often until an H3K4me3 site. These appear to reflect the presence of unannotated alternate promoters. A notable example is the Foxp1 locus. In ES cells, one annotated Foxp1 promoter is marked by H3K4me3 and another CpG-rich region located ~500 kb upstream carries a bivalent mark. In MEFs, this CpG island is marked by H3K4me3 only, and H3K36me3 extends from this site to the 3’ end of Foxp1 (Fig. 5a). Although no transcript extending across this entire region has been reported in mouse, the orthologous position in human has been shown to act as a promoter for the orthologous gene. The ChIP-Seq data contain many other examples where the combination of H3K36me3 and H3K4me3 appear to reveal novel promoters.

The second category corresponds to H3K36me3 that extends significantly downstream of a known gene. An example is the Sox2 locus, which encodes a pluripotency-associated transcription factor that also functions during neural development. In ES cells, Sox2 has an unusually large region of H3K4me3 (>20 kb) accompanied by H3K36me3 extending far beyond the annotated 3’-end (>15 kb); non-coding transcription throughout the locus has been noted previously34 and may serve a regulatory role (Fig. 5b).

The third category appears to reflect transcription of non-coding RNA genes. For example, two regions with H3K36me3 and adjacent H3K4me3 correspond to recently discovered nuclear transcripts with possible functions in mRNA processing35 (Fig. 5c). In addition, a number of these presumptive transcriptional units overlap microRNAs (Fig. 5d). A striking example is a >200 kb interval within the Dlk1-Dio3 imprinted locus (Fig. 6a). This region harbors over 40 non-coding RNAs, including clusters of microRNAs and small nucleolar RNAs36. The ChIP-Seq data suggest that the entire region is transcribed as a single unit that initiates at an H3K4me3 marked HCP.

Open in a separate windowFigure 6Allele-specific histone methylation and genic H3K9me3/H4K20me3

(a) H3K4me3 and H3K36me3 indicate a primary microRNA transcript in the Dlk1-Dio3 locus. The allele-specificity of this transcript is read out using ChIP-Seq data for hybrid ES cells and a SNP catalogue. The H3K36me3 reads overwhelmingly correspond to maternal 129 alleles, consistent with the known maternal expression of these microRNAs36. (b) In contrast, a non-imprinted transcript shows roughly equal proportions of reads assigned to 129 and castaneus alleles. (c) Peg13 is marked by H3K4me3 and H3K9me3 in ES cells; 19 of 21 H3K4me3 reads correspond to the paternal castaneus allele, while 6 of 6 H3K9me3 reads correspond to the maternal 129 allele, consistent with paternal expression of this gene. (d) H3K9 me3 and H4K20me3 enrichment evident at the Polrmt gene may reflect transcriptional interference due to antisense transcription from the 3’ UTR CpG island of Hcn2 (see text).

These findings suggest that genome-wide maps of H3K4me3 and H3K36me3 may provide a general tool for defining novel transcription units. The capacity to define the origins and extents of primary transcripts will be of particular value for characterizing the regulation of microRNAs and other non-coding RNAs that are rapidly processed from long precursors37. Finally, the relatively narrow dynamic range of H3K36me3 may offer advantages over RNA-based approaches in assessing gene expression and defining cellular states.

H3K9 and H4K20 tri-methylation associated with specific repetitive elements

We next studied H3K9me3 and H4K20me3, both of which have been associated with silencing of centromeres, transposons and tandem repeats38–40. We sought first to assess the relative enrichments of H3K9me3 and H4K20me3 across different types of repetitive elements by aligning ChIP-Seq reads directly to consensus sequences for various repeat families (~40 million reads could be aligned this way).

H3K9me3 and H4K20me3 show nearly identical patterns of enrichment in ES cells. The strongest enrichments are observed for telomeric, satellite, and long terminal repeats (LTRs). The LTR signal primarily reflects enrichment of intracisternal A-particles (IAP), early transposon (ETn) elements, and the LTRIS sub-family (Supplementary Fig. 8).

IAP and ETn elements are active in murine ES cells and produce double-stranded RNAs41,42. RNA has also been implicated in maintaining satellite and telomeric heterochromatin38. Hence, these enrichment data are consistent with a global role for RNA in targeting repressive chromatin marks in mammalian ES cells, analogous to that observed in lower eukaryotes38,39.

We next examined the distributions of H3K9me3 and H4K20me3 across unique sequence in the mouse genome. We identified ~1800 H3K9me3 sites (median size ~300 bp) in ES cells, with the vast majority also showing H4K20me3. Fully 78% of the sites lie within two kb of a satellite repeat or LTR (primarily IAP and ETn elements). This suggests that repressive marks are capable of spreading from repeat insertions and could potentially regulate proximal unique sequence.

Recent studies have described a handful of active genes with H3K9me3 and H4K20me3, raising the possibility that these ‘repressive’ marks also function in transcriptional activation31,32. One-third of the ~1800 H3K9me3 enriched sites reside within an annotated gene, which is roughly the proportion expected by chance. However, H3K9me3 sites that are larger and/or more distant from LTRs are more likely to occur within genes (Supplementary Fig. 9). The largest genic site in ES cells (~6 kb) coincides with the Polrmt gene (Fig. 6d). This case is notable because the downstream gene (Hcn2) is convergent and contains a CpG island at its 3’ end. Transcription from 3’ promoters has been proposed as a potential mechanism of transcriptional interference by producing antisense transcripts23. This example may therefore reflect a link between transcriptional interference and H3K9me3, as has been suggested for a few other mammalian loci43,44. Our results thus confirm the presence of H3K9me3 within a subset of genes, although the functional implications remain to be elucidated.

Imprinting control regions show overlapping H3K4 and H3K9 tri-methylation

We next studied chromatin marks associated with imprinting. This epigenetic process typically involves allele-specific DNA methylation of CpG-rich imprinting control regions (ICRs)45. Several reports have also described allele-specific chromatin modification at a handful of ICRs, with H3K9me3 and H4K20me3 on the DNA methylated allele and H3K4me3 on the opposite allele46,47.

We searched for regions showing overlapping H3K9me3 and H3K4me3 in ES cells. Strikingly, 13 of the top 20 sites, as ranked by enrichment of the two marks, are located within known imprinted regions, coincident with ICRs or imprinted gene promoters. An example is the Peg13 promoter (Fig. 6c). Conversely, of the ~20 known and putative autosomal imprinted loci that contain ICRs, 17 have at least one with the overlapping chromatin marks (Supplementary Table 5). We conclude that overlapping H3K9me3 and H3K4me3 is a common signature of ICRs in ES cells.

Allele-specific histone methylation

To explore the feasibility of inferring allele-specific chromatin states, we constructed chromatin-state maps in male ES cells derived from a more distant cross (129 (maternal) × M. castaneus (paternal)), and used a catalog of ~3.5 million SNPs to assign ChIP-Seq reads to one of the two parental alleles.

As a positive control, we first compared results for chromosome X and the autosomes for reads derived by H3K4me3 ChIP. Virtually all (97%) of ~3700 informative reads on chromosome X, and roughly half (57%) of the 178,000 informative reads on the autosomes, were assigned to the 129 strain. These proportions correspond roughly to the expected 100% and 50%.

We then examined the allelic distribution at overlapping H3K4me3 and H3K9me3 sites coincident with putative ICRs (see above). Six of the ICRs had enough reads (≥10) containing SNPs to assess allelic bias. In every case, the SNPs showed significant bias in the expected direction (p

Supplementary Information is linked to the online version of the paper at www.nature.com/nature

Author Information

All analyzed data sets can be obtained from www.broad.mit.edu/seq_platform/chip/. Microarray data have been submitted to the GEO repository under accession number {"type":"entrez-geo","attrs":{"text":"GSE8024","term_id":"8024"}}GSE8024. Reprints and permissions information is available from www.nature.com/reprints. The authors declare no competing financial interests.

References1. Surani MA, Hayashi K, Hajkova P. Genetic and epigenetic regulators of pluripotency. Cell. 2007;128:747–762. [PubMed] [Google Scholar]2. Bernstein BE, Meissner A, Lander ES. The mammalian epigenome. Cell. 2007;128:669–681. [PubMed] [Google Scholar]3. Kouzarides T. Chromatin modifications and their function. Cell. 2007;128:693–705. [PubMed] [Google Scholar]4. Buck MJ, Lieb JD. ChIP-chip: considerations for the design, analysis, and application of genome-wide chromatin immunoprecipitation experiments. Genomics. 2004;83:349–360. [PubMed] [Google Scholar]5. Mockler TC, et al. Applications of DNA tiling arrays for whole-genome analysis. Genomics. 2005;85:1–15. [PubMed] [Google Scholar]6. Roh TY, Cuddapah S, Zhao K. Active chromatin domains are defined by acetylation islands revealed by genome-wide mapping. Genes Dev. 2005;19:542–552. [PMC free article] [PubMed] [Google Scholar]7. Service RF. Gene sequencing. The race for the $1000 genome. Science. 2006;311:1544–1546. [PubMed] [Google Scholar]8. Conti L, et al. Niche-independent symmetrical self-renewal of a mammalian tissue stem cell. PLoS Biol. 2005;3:e283. [PMC free article] [PubMed] [Google Scholar]9. Bernstein BE, et al. A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell. 2006;125:315–326. [PubMed] [Google Scholar]10. Ringrose L, Paro R. Epigenetic regulation of cellular memory by the Polycomb and Trithorax group proteins. Annu Rev Genet. 2004;38:413–443. [PubMed] [Google Scholar]11. Schuettengruber B, Chourrout D, Vervoort M, Leblanc B, Cavalli G. Genome regulation by polycomb and trithorax proteins. Cell. 2007;128:735–745. [PubMed] [Google Scholar]12. Azuara V, et al. Chromatin signatures of pluripotent cell lines. Nat Cell Biol. 2006;8:532–538. [PubMed] [Google Scholar]13. Saxonov S, Berg P, Brutlag DL. A genome-wide analysis of CpG dinucleotides in the human genome distinguishes two distinct classes of promoters. Proc Natl Acad Sci U S A. 2006;103:1412–1417. [PMC free article] [PubMed] [Google Scholar]14. Weber M, et al. Distribution, silencing potential and evolutionary impact of promoter DNA methylation in the human genome. Nat Genet. 2007;39:457–466. [PubMed] [Google Scholar]15. Bernstein BE, et al. Genomic maps and comparative analysis of histone modifications in human and mouse. Cell. 2005;120:169–181. [PubMed] [Google Scholar]16. Kim TH, et al. A high-resolution map of active promoters in the human genome. Nature. 2005;436:876–880. [PMC free article] [PubMed] [Google Scholar]17. Boyer LA, et al. Polycomb complexes repress developmental regulators in murine embryonic stem cells. Nature. 2006 [PubMed] [Google Scholar]18. Lee TI, et al. Control of developmental regulators by polycomb in human embryonic stem cells. Cell. 2006;125:301–313. [PMC free article] [PubMed] [Google Scholar]19. Squazzo SL, et al. Suz12 binds to silenced regions of the genome in a cell-type-specific manner. Genome Res. 2006;16:890–900. [PMC free article] [PubMed] [Google Scholar]20. Pasini D, Bracken AP, Hansen JB, Capillo M, Helin K. The Polycomb Group protein Suz12 is required for Embryonic Stem Cell differentiation. Mol Cell Biol. 2007 [PMC free article] [PubMed] [Google Scholar]21. Klose RJ, Bird AP. Genomic DNA methylation: the mark and its mediators. Trends Biochem Sci. 2006;31:89–97. [PubMed] [Google Scholar]22. Wang X, Su H, Bradley A. Molecular mechanisms governing Pcdh-gamma gene expression: evidence for a multiple promoter and cis-alternative splicing model. Genes Dev. 2002;16:1890–1905. [PMC free article] [PubMed] [Google Scholar]23. Carninci P, et al. The transcriptional landscape of the mammalian genome. Science. 2005;309:1559–1563. [PubMed] [Google Scholar]24. Alexander DL, Ganem LG, Fernandez-Salguero P, Gonzalez F, Jefcoate CR. Aryl-hydrocarbon receptor is an inhibitory regulator of lipid synthesis and of commitment to adipogenesis. J Cell Sci. 1998;111(Pt 22):3311–3322. [PubMed] [Google Scholar]25. Lengner CJ, et al. Primary mouse embryonic fibroblasts: a model of mesenchymal cartilage formation. J Cell Physiol. 2004;200:327–333. [PubMed] [Google Scholar]26. Garreta E, Genove E, Borros S, Semino CE. Osteogenic differentiation of mouse embryonic stem cells and mouse embryonic fibroblasts in a three-dimensional self-assembling peptide scaffold. Tissue Eng. 2006;12:2215–2227. [PubMed] [Google Scholar]27. Doetsch F. The glial identity of neural stem cells. Nat Neurosci. 2003;6:1127–1134. [PubMed] [Google Scholar]28. Krichevsky AM, Sonntag KC, Isacson O, Kosik KS. Specific microRNAs modulate embryonic stem cell-derived neurogenesis. Stem Cells. 2006;24:857–864. [PMC free article] [PubMed] [Google Scholar]29. Rao B, Shibata Y, Strahl BD, Lieb JD. Dimethylation of histone H3 at lysine 36 demarcates regulatory and nonregulatory chromatin genome-wide. Mol Cell Biol. 2005;25:9447–9459. [PMC free article] [PubMed] [Google Scholar]30. Bannister AJ, et al. Spatial distribution of di- and tri-methyl lysine 36 of histone H3 at active genes. J Biol Chem. 2005;280:17732–17736. [PubMed] [Google Scholar]31. Kim A, Kiefer CM, Dean A. Distinctive signatures of histone methylation in transcribed coding and noncoding human beta-globin sequences. Mol Cell Biol. 2007;27:1271–1279. [PMC free article] [PubMed] [Google Scholar]32. Vakoc CR, Sachdeva MM, Wang H, Blobel GA. Profile of histone lysine methylation across transcribed mammalian chromatin. Mol Cell Biol. 2006;26:9185–9195. [PMC free article] [PubMed] [Google Scholar]33. Li B, Carey M, Workman JL. The role of chromatin during transcription. Cell. 2007;128:707–719. [PubMed] [Google Scholar]34. Fantes J, et al. Mutations in SOX2 cause anophthalmia. Nat Genet. 2003;33:461–463. [PubMed] [Google Scholar]35. Hutchinson JN, et al. A screen for nuclear transcripts identifies two linked noncoding RNAs associated with SC35 splicing domains. BMC Genomics. 2007;8:39. [PMC free article] [PubMed] [Google Scholar]36. Seitz H, et al. A large imprinted microRNA gene cluster at the mouse Dlk1-Gtl2 domain. Genome Res. 2004;14:1741–1748. [PMC free article] [PubMed] [Google Scholar]37. Cullen BR. Transcription and processing of human microRNA precursors. Mol Cell. 2004;16:861–865. [PubMed] [Google Scholar]38. Zaratiegui M, Irvine DV, Martienssen RA. Noncoding RNAs and gene silencing. Cell. 2007;128:763–776. [PubMed] [Google Scholar]39. Verdel A, Moazed D. RNAi-directed assembly of heterochromatin in fission yeast. FEBS Lett. 2005;579:5872–5878. [PubMed] [Google Scholar]40. Martens JH, et al. The profile of repeat-associated histone lysine methylation states in the mouse epigenome. Embo J. 2005;24:800–812. [PMC free article] [PubMed] [Google Scholar]41. Baust C, et al. Structure and expression of mobile ETnII retroelements and their coding-competent MusD relatives in the mouse. J Virol. 2003;77:11448–11458. [PMC free article] [PubMed] [Google Scholar]42. Svoboda P, et al. RNAi and expression of retrotransposons MuERV-L and IAP in preimplantation mouse embryos. Dev Biol. 2004;269:276–285. [PubMed] [Google Scholar]43. Cho DH, et al. Antisense transcription and heterochromatin at the DM1 CTG repeats are constrained by CTCF. Mol Cell. 2005;20:483–489. [PubMed] [Google Scholar]44. Feng YQ, et al. The human beta-globin locus control region can silence as well as activate gene expression. Mol Cell Biol. 2005;25:3864–3874. [PMC free article] [PubMed] [Google Scholar]45. Edwards CA, Ferguson-Smith AC. Mechanisms regulating imprinted genes in clusters. Curr Opin Cell Biol. 2007 [PubMed] [Google Scholar]46. Delaval K, et al. Differential histone modifications mark mouse imprinting control regions during spermatogenesis. Embo J. 2007;26:720–729. [PMC free article] [PubMed] [Google Scholar]47. Feil R, Berger F. Convergent evolution of genomic imprinting in plants and mammals. Trends Genet. 2007;23:192–199. [PubMed] [Google Scholar]48. Strausberg RL, et al. Generation and initial analysis of more than 15,000 full-length human and mouse cDNA sequences. Proc Natl Acad Sci U S A. 2002;99:16899–16903. [PMC free article] [PubMed] [Google Scholar]


【本文地址】


今日新闻


推荐新闻


CopyRight 2018-2019 办公设备维修网 版权所有 豫ICP备15022753号-3